
Registering an agent
Only registered agents are monitored. To register an agent:- Select + Register Agent in the top right.
- Choose an agent from the Agent dropdown.
- Toggle Automatic analysis on to run evaluation on a schedule.
- If automatic analysis is on, set the frequency: Every hour, Every 3 hours, Every 6 hours, Every 12 hours, Daily, Weekly, or Custom.
- Select Register Agent.

- Real-time issue detection from traces
- Performance bottleneck alerts
- AI-powered prompt improvement suggestions
Dashboard overview
The main dashboard gives a cross-agent view of all registered agents. Summary stats across the top show total issues from all registered agents, split into Unresolved and Resolved counts, with a severity breakdown of Critical, Medium, and Low. The agents table lists each registered agent with:| Column | Description |
|---|---|
| Status | Live (actively monitored) or Paused |
| Issues | Total issues detected and how many of the agent’s traces were analyzed |
| Issue Severity | C / M / L count breakdown of current issues |
| Unresolved | Issues still open and needing attention |
Viewing agent issues
Select any agent from the table to open its detail view. The header shows:- Agent name and live status
- Analysis interval (for example, Every 3 hours)
- Model and provider
- Total traces analyzed and when analysis last ran
Issues tab
The Issues tab lists all detected issues. Use the Severity filter to focus on Critical, Medium, or Low issues, and the Category dropdown to filter by issue type. Select Run Analysis to trigger a fresh analysis run on demand. Each row in the issues table shows:| Column | Description |
|---|---|
| Issue | Title and a short description of what was detected |
| Severity | Critical, Medium, or Low - based on how significantly the issue affects agent quality |
| Category | Type of issue - for example, Low Task Completion, Hallucination, Knowledge Base |
| Score | Numeric score for the affected metric, with the passing threshold shown below it. A score of 0.30 with a threshold of 0.70 means the agent is well below the acceptable range |
| Trace | The trace ID where the issue was detected - select it to open Trace Details |
| Detected | When the issue was first seen |

Trace Details
Selecting a trace ID opens the Trace Details panel. It shows:- The detected issue, its category tag, and a full description of why it was flagged - including evidence and what the evaluator expected to find
- Duration, total tokens, tool calls, and cost for that trace
- A trace timeline showing the span breakdown
- Agent information: name, model, and provider
Agent Hardening
The Agent Hardening tab shows AI-generated suggestions for improving the agent’s configuration. The engine analyzes patterns across detected issues - not just individual failures - and produces a consolidated suggestion that addresses the underlying root causes. Each suggestion in the list shows:| Column | Description |
|---|---|
| Suggestion | Name of the hardened configuration |
| Status | Pending (not yet applied) or Applied |
| Fields | Which parts of the agent config are proposed to change, such as Goal or Instructions |
| Size delta | How many characters the suggested change adds or removes relative to the current config |
| Generated | When the suggestion was created |
- Reasoning - a plain-language explanation of why the changes are recommended and which detected issues they address
- Expected improvements - which metrics or behaviors should improve after applying the changes, shown as tags (for example, “Task Completion: Better task fulfillment”)
- Diff view - switch between Inline Diff and Side by Side to compare the current and proposed Goal and Instructions. Additions are shown in green, removals in red.

Settings
The Settings tab controls how the Improvement Engine monitors an agent.
Analysis schedule
Set whether analysis runs automatically and at what frequency. The panel shows when the next scheduled run is. You can change the interval at any time - the change takes effect before the next scheduled run. Manual Run Analysis always works on demand regardless of the automatic schedule setting.Runaway limits
Agent evaluation consumes tokens and incurs cost. Runaway limits let you set guardrails so a single expensive trace or a sustained period of high usage does not run up an unexpected bill. Enable Runaway detection to activate limits. Leaving a field blank inherits the workspace default. Per-trace ceilings flag and stop evaluation for a single trace that exceeds a threshold:| Ceiling | Preset options |
|---|---|
| Cost | $0.25, $0.50, $1.00, or a custom value |
| Latency | 10s, 20s, 60s, or a custom value |
| Tokens | 25,000, 50,000, 100,000, or a custom value |
- Cost: Daily and Monthly limits
- Tokens: Daily and Monthly limits
Tracked metrics
Metrics are auto-selected based on the agent’s configuration. For most agents, Task Completion and Hallucinations are active by default. Tool and Knowledge Base metrics activate automatically when the agent has tools or a Knowledge Base connected.| Metric | Module | What it checks |
|---|---|---|
| Task Completion | Base | How fully the agent accomplishes the user’s request |
| Hallucinations | Base | Detects fabricated, unverifiable, or invented claims |
| Tool Correctness | Tools | Whether the right tool was chosen at the right time |
| Argument Correctness | Tools | Precision of tool arguments - types, values, and formats |
| Contextual Relevancy | Knowledge Base | Relevance and sufficiency of retrieved context |
| Answer Relevancy | Knowledge Base | Whether the response directly addresses the user’s question |
| Knowledge Retention | Knowledge Base | Consistency and coherence across multi-step reasoning |
Alerts
Configure where the engine sends notifications when analysis events occur. You can add multiple email channels with different recipient lists and event subscriptions. Supported events:- Issues found - new issues were detected in an analysis run
- Suggestion ready - a new hardening suggestion has been generated
- Analysis failed - an analysis run could not complete
- Resource runaway - a runaway limit was tripped
Upcoming features
- Real-time monitoring - analyze each trace via webhook as it completes, without waiting for a scheduled run
- Custom judges - define your own evaluation criteria beyond the built-in tracked metrics